#이진분류를 위해 3개의 범주를 보유한 iris의 Species를 두개의 범주만 갖도록 추출
> iris_bin1<-subset(iris, Species=='setosa'|Species=='versicolor’)
> str(iris_bin1)
'data.frame': 100 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
> result<-glm(data=iris_bin1, Species~., family='binomial')
Warning messages:
1: glm.fit: algorithm did not converge
2: glm.fit: fitted probabilities numerically 0 or 1 occurred
# control 값으로 알고리즘 수렴 문제 조정 가능
> result<-glm(data=iris_bin1, Species~., family='binomial', control=list(maxit=50))
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred
> pairs(iris_bin1, col=iris_bin1$Species)
> result<-glm(data=iris_bin1, Species~Petal.Width, family='binomial', control=list(maxit=50))
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred
> result<-glm(data=iris_bin1, Species~Petal.Length, family='binomial', control=list(maxit=50))
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred
> result<-glm(data=iris_bin1, Species~Sepal.Length, family='binomial’)
> summary(result)
Call:
glm(formula = Species ~ Sepal.Length, family = "binomial", data = iris_bin1)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.05501 -0.47395 -0.02829 0.39788 2.32915
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -27.831 5.434 -5.122 3.02e-07 ***
Sepal.Length 5.140 1.007 5.107 3.28e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 138.629 on 99 degrees of freedom
Residual deviance: 64.211 on 98 degrees of freedom
AIC: 68.211
Number of Fisher Scoring iterations: 6
#직접 p-value 값을 구해서 기각여부 판단
> 1-pchisq(138.629, df=99)
[1] 0.005302078
# 기각 가능, 적합결여 판정으로 절편만 포함한 모형은 완전모형에 가깝지 못하다.
> 1-pchisq(64.211, df=98)
[1] 0.9966935
# 기각 불가능, 독립변수들이 포함된 모형은 완전모형에 가깝다.